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Abstract 

Finding, counting and/or listing triangles (three vertices with three edges) in large graphs are 
natural fundamental problems, which received recently much attention because of their importance in 
complex network analysis. We provide here a detailed state of the art on these problems, in a unified 
way. We note that, until now, authors paid surprisingly little attention to space complexity, despite 
its both fundamental and practical interest. We give the space complexities of known algorithms and 
discuss their implications. Then we propose improvements of a known algorithm, as well as a new 
algorithm, which are time optimal for triangle listing and beats previous algorithms concerning space 
complexity. They have the additional advantage of performing better on power-law graphs, which we 
also study. We finally show with an experimental study that these two algorithms perform very well 
in practice, allowing to handle cases that were previously out of reach. 

1 Introduction. 

A triangle in an undirected graph is a set of three vertices such that each possible edge between them is 
present in the graph. Following classical conventions, we call finding, counting and listing the problems of 
deciding if a given graph contains any triangle, counting the number of triangles in the graph, and listing 
all of them, respectively. We moreover call pseudo-listing the problem of counting for each vertex the 
number of triangles to which it belongs. We refer to all these problems as a whole by triangle problems. 

Triangle problems may be considered as classical, natural and fundamental algorithmic questions, and 
have been studied as such [H HHHISl 01 03| • 

Moreover, they gained recently much practical importance since they are central in so-called complex 
network analysis, see for instance |351ll3l [Tlll9j. First, they are involved in the computation of one of the 
main statistical property used to describe large graphs met in practice, namely the clustering coefficient 
j35j . The clustering coefficient of a vertex v (of degree at least 2) is the probability that any two randomly 
chosen neighbors of v are linked together. It is computed by dividing the number of triangles containing v 
by the number of possible edges between its neighbors, i.e. ('^2'^) if d{v) denotes the number of neighbors 
of V. One may then define the clustering coefficient of the whole graph as the average of this value for 
all the vertices (of degree at least 2). Likewise, the transitivity ratio^ EDI is defined as where 
A^A denotes the number of triangles in the graph and Ny denotes the number of connected triples, i.e. 
sets of three vertices with at least two edges, in the graph. 

In the context of complex network analysis, triangles also play a key role in the study of motif 
occurrences, i.e. the presence of special (small) subgraphs in given (large) graphs. This has been studied 
in particular in protein interaction networks, where some motifs may correspond to biological functions, 
see for instance |28( I86j . Triangles often are building blocks of these motifs. 

*LIAFA, CNRS and Universite Paris 7, 2 place Jussieu, 75005 Paris, France, latapy@liafa.jussieu.fr 
^Even though some authors make no distinction between the two notions, they are different, see for instance [121 
Both have their own advantages and drawbacks, but discussing this is out of the scope of this contribution. 
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Finally, triangle finding, counting, pseudo- listing and/or listing appear as key issues both from a 
fundamental point of view and for practical purpose. The aim of this contribution is to review the 
algorithms proposed until now for solving these problems with both a fundamental perspective (we discuss 
asymptotic complexities and give detailed proofs) and a practical one (we discuss space requirements and 
graph encoding, and we evaluate algorithms with some experiments). 

We note that, until now, authors paid surprisingly little attention to space requirements of their 
algorithms for triangle problems; this however is an important limitation in practice, and this also induces 
interesting theoretical questions. We will therefore discuss this (all space complexity results stated in this 
paper are new, though very simple in most cases), and we will propose space-efficient algorithms. 

The paper is organised as follows. After a few preliminaries (Section |2)), we begin with results on 
finding, counting and pseudo-listing problems, between which basically no difference in complexity is 
known (Section Then we turn to the harder problem of triangle listing, in Section 0] In these 
parts of the paper, we deal with both the general case (no assumption is made on the graph) and on the 
important case where the graph is sparse. Many very large graphs met in practice also have heterogeneous 
degrees; we focus on this case in Sectional Finally, we present experimental evaluations in Sectional We 
summarise the current state of the art and we point out the main perspectives in Section [Tj 

2 Preliminaries. 

Throughout the paper, we consider an undirected'^ graph G = {V,E) with n = \ V\ vertices and m = \E\ 
edges. We suppose that G is simple {{v,v) E for all v, and there is no multiple edge). We also assume 
that m £ 0(n); this is a classical convention which plays no role in our algorithms but makes complexity 
formulae simpler. We denote by N{v) = {u £ V, {v,u) G E} the neighborhood v £ V and by 
d{v) = \N{v)\ its degree. We also denote by dmax the maximal degree in G: dmax = max^,{d(t;)}. 

Before entering in the core of this paper, we need to discuss a few issues that will play an important 
role in the following. They are necessary to make the discussion all along the paper precise and rigorous. 

Graph encodings. 

First note that we will always suppose that the graph is stored in central memory^. There are basically 
two ways to do this: 

• G may be encoded by its adjacency matrix A defined by Aij = 1 if (i,j) G E, A-ij = else. This 
has a 0(n^) space cost. Since m may be up to 0(n^), this representation is space optimal in this 
case (but it is not as soon as the graph is sparse, i.e. m S o(n^)), and makes it possible to test the 
presence of any edge in 0(1). Note however that one cannot run through N{v) in 0{d{v)) time with 
such a representation: one needs 0(n) time. Since d[v) may be up to 0(n), this is not a problem 
in the general case. 

• G may be encoded by a simple compact representation: for each vertex v we can access the set 
of its neighbors N{v) and its degree d{v) in 0(1) time and space cost. The set N{v) usually is 
encoded using a linked list or an array, in order to be able to run through it in Q{d{v)) time and 
0(1) space. It may moreover be sorted (an order on the vertices is supposed to be given). This 
representation has the advantage of being space efficient: it needs only 0(m) space. However, 
testing the presence of the edge (tx,f) is in Q{d{v)) time {0{\.og{d{v))) if N{v) is a sorted array). 
We call any representation having these properties a simple compact representation of G. 

^i.e. we make no difference between {u,v) and {v,u) inV xV. 

■^Approaclies not requiring tliis, based on streaming algorithms for instance |22l HI 1^. or various methods to compress 
the graph [Hll^, also exist. This is however out of the scope of this paper. 
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Since the basic operations of such representations do not have the same complexity, they may play 
a key role in algorithms using them. We will see that this is indeed the case in our context. We 
note moreover that, in the context of large graph manipulation, the adjacency matrix often is untractable 
because of its space requirements. This is why one generally uses (sorted) simple compact representations 
in practice. 

One may easily convert any simple compact representation of G into its adjacency array representa- 
tion, in time 0(m) using Q{n) additional space (it suffices to transform iteratively each set N{v) and 
to free the memory used by the previous representation at each step). Moreover, once the adjacency 
array representation of G is available, one may compute its sorted version in 6 (Ylv^i'^) ' ^^?>{d{v))) C 
O (^^d{v) • log(n)) = 0{m ■ log(n)) time and 0(1) additional space. One may therefore intuitively 
make no difference between any simple compact representation of G and its sorted adjacency array rep- 
resentation, as long as the overall algorithm complexity is in n{m ■ log(n)) time and r2(n) space. 

One may also obtain a simple compact representation of G from its adjacency matrix in time B(n^) 
and additional space 0(n) (provided that one does not need the matrix anymore, else it costs 0(m)). 
This cost is not neglectible in most cases, and thus we will suppose that algorithms that need the two 
representations receive them both as inputs. 

Finally, note that one may use more subtle structures to encode the sets N{v) for all v. Balanced 
trees and hashtables are the most classical ones. Since we focus on worst case analysis (see below), such 
encodings have no impact on our results, and so we make no difference between them and any other 
simple compact representation. 

(Additional) Space complexity. 

As explained above, storing the graph itself generally is in @{n'^) or 0(m) space complexity. Moreover, the 
space requirements of the algorithms we will study are, in most cases, lower than the space requirements 
of the graph storing. Therefore, their space complexity is the one of the chosen graph representation, 
which makes little sense. 

However, limiting the space needed by the algorithm in addition to the one needed to store the graph 
often is a key issue in practice: current main limitation in triangle problems on real-world complex 
networks is space requirements. We illustrate this in Sectional 

For these reasons, the space complexities we discuss concern the additional space needed by the 
algorithm, i.e. not including the graph storage. As we will see, this notion makes a significant difference 
between various algorithms, and therefore also has a fundamental interest. 

Likewise, and following classical conventions, we do not include the size of the output in our space 
complexities. Otherwise, triangle listing would need Q{n^) space in the worst case, and pseudo-listing 
would need f2(n) space, which brings little information, if any. 

Worst case complexity, and graph families. 

All the complexities we discuss in this paper are worst case complexities, in the sense that they are 
bounds for the time and space needs of the algorithms, on any input. In most cases, these bounds are 
tight (leading to the use of the 0() notation, see for instance |Ej for definitions). In other words, we say 
that an algorithm is in 0(/(n)) if there exists an instance of the input such that the algorithm runs with 
this complexity (even if some instances induce lower complexity). In several case, however, the worst 
case complexity actually is the complexity for any input (in the case of Theorem lU for instance, and for 
most space complexities). 

It would also be of high interest to study the expected behavior of triangle algorithms, in addition to 
the worst case one. This has been done in some cases; for instance, it is proved in .23, that vertex-iterator 
(see Section l4.ip has expected time complexity in 0(n3). Obtaining such results however often is very 
difficult, and their relevance for practical purpose is not always clear: the choice of a model for the average 
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input is a difficult task (in our context, random graplis would be an unsatisfactory choice |13l [TllH5]^. We 
therefore focus on worst case analysis, which has the advantage of giving guarantees on the behaviors of 
algorithms, on any input. 

Another interesting approach is to study (worst case) complexities on given graph families. This has 
already been done on various cases, the most important ones probably being the sparse graphs, i.e. graphs 
in which m is in o(n^). This is motivated by the fact that most real- world complex networks lead to 
such graphs, see for instance ^lEIHSI- In general, it is even assumed that m is in 0{n). Recent studies 
however show that, despite the fact that m is small compared to n^, it may be in uj{n) (221 OOl EHI • Other 
classes of graphs have been considered, like for instance planar graphs: it is shown in |^ that one may 
decide if any planar graph contains a triangle in 0{n) time. 

We do not detail all these results here. Since we are particularity interested in real-world complex 
networks, we present in detail the results concerning sparse graphs all along the paper. We also introduce 
new results on power-law graphs (Sectional), which capture an important property met in practice. A 
survey on available results on specific classes of graphs remains to be done, and is out of the scope of this 
paper. 

3 The fastest algorithms for finding, counting, and pseudo-listing. 

The fastest algorithm known for pseudo-listing relies on fast matrix product Indeed, if one 

considers the adjacency matrix A oi G then the value on the diagonal of is nothing but twice the 
number of triangles to which v belongs, for any v. Finding, counting and pseudo-listing triangle problems 
can therefore be solved in 0{n'^) time, where uj < 2.376 is the fast matrix product exponent ^S]- This 
was first noticed in 1978 > and currently no faster algorithm is known for any of these problems in the 
general case, even for triangle finding (but this is no longer true when the graph is sparse, see Theorem |2l 
below). 

This approach naturally needs the graph to be given by its adjacency matrix representation. Moreover, 
it makes it necessary to compute and store the matrix A^, leading to a 0(n^) space complexity in addition 
to the adjacency matrix storage. 

Theorem 1 ( [23^ ITHj ) Given the adjacency matrix representation of G, it is possible to solve triangle 
finding, counting and pseudo-listing in 0{n^) C 0(n^'^''^) time and G(n^) space on G using fast matrix 
product. 

This time complexity is the current state of our knowledge, as long as one makes no assumption on 
G. Note that no lower bound is known for this complexity; therefore faster algorithms may be designed. 

As we will see, there exists (slower) algorithms with lower space complexity for these problems. Some 
of these algorithms only need a simple compact representation of G. They are derived from listing 
algorithms, which we present in Sectional 

One can design faster algorithms if G is sparse. In |23j . it was first proved that triangle finding, 
counting, pseudo-listing and listing^ can be solved in G(m2) time and 0(m) space. This result has 
been improved in Jl] using a property of the graph (namely arboricity) but the worst case complexites 
were unchanged. No better result was known until 1995 |31 [21, where the authors prove Theorem [21 
below ^, which constitutes a significant improvement although it relies on very simple ideas. We detail 
the proof and give a slightly different version, which will be useful in the following (similar ideas are used 
in Section 14.31 and this proof permits a straightforward extension of this theorem in Section [^J . 

''The original results actually concern triangle finding but they can easily be extended to counting, pseudo-listing and 
listing at no cost; we present such an extension in Section 0J Algorithm |1] (tree-Zistint/). 

^Again, the original results concerned triangle finding, but may easily be extended to pseudo-listing, see Algorithm 
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Input: any simple compact representation of G, its adjacency matrix A, and an integer K 
Output: T such that T[v] is the number of triangles in G containing v 

1. initialise T[v] to for all v 

2. for each vertex v with d{v) < K: 

2a. for each pair {n, w} of neighbors of v. 
2aa. if A[u, w] then: 
2aaa. increment T[v\ 

2aab. if d{u) > K and d[w) > K then increment T[n] and T[w\ 
2aac. else if d{u) > K and u > v then increment T[n] 
2aad. else if d{w) > K and w > v then increment r[t(;] 

3. let G' be the subgraph of G induced by {v, d{v) > K} 

4. construct the adjacency matrix A' of G' 

5. compute A'^ using fast matrix product 

6. for each vertex v with d{v) > K: 
6a. add to T[v\ half the value in A'^^ 

Algorithm 1: — ay z-pseudo -listing. Counts for all v the triangles in G containing v 



Theorem 2 (1^11^) Given any simple compact representation of G and its adjacency matrix, it is 

2-LJ 

possible to solve triangle finding, counting and pseudo-listing on G in 0{m^^) C 0{m^'^^) time and 
O ^m'^^ C 0{m^'^^^) space; Algorithm^ (ayz-pseudo-listingj achieves this if one takes K G G(m"+i ). 

Proof: Let us first show that PAgoT\th.m.^ [ay z-pseudo -listing) solves pseudo-listing (and thus counting 
and finding). Consider a triangle in G that contains a vertex with degree at most K; then it is discovered 
in lines 2a and 2aa. Lines 2aaa to 2aad ensure that it is counted exactly once for each vertex it countains. 
Consider now the triangles in which all the vertices have degree larger than K. Each of them induces a 
triangle in G', and G' contains no other triangle. These triangles are counted using the matrix product 
approach (lines 5, 6 and 6a), and finally all the triangles in G are counted for each vertex. 

Let us now study the time complexity of PAgoT\th.m.^{ayz-pseudo-listing) in function of K. For each 
vertex v with d{v) < K, one counts the number of triangles containing v in G(d(f)^) C 0(d{v) ■ K) 
thanks to the simple compact representation of G. If we sum over all the vertices in the graph this leads 
to a time complexity in 0{m.K) for lines 2 to 2aad. Now notice that there cannot be more than 
vertices v with d{v) > K. Line 4 constructs (in O {m + (^)^) time, which plays no role in the global 
complexity) the adjacency matrix of the subgraph G' of G induced by these vertices. Using fast matrix 
product, line 5 computes the number of triangles for each vertex in G' in time O ((^)'^)- Finally, we 
obtain the overall time complexity of the algorithm: O {m.K + (^)'^)- 

In order to minimize this, one has to search for a value of K such that m ■ K €z 0((^)'^). This leads 

to K ^ 0(m"+i), which gives the announced time complexity. 

Concerning space complexity, the key point is that one has to construct A' , A''^ and A'^. The matrix 
A' may contain vertices, leading to a (^(^)^^ = © ^m^'(^~ )^ = Q ^m^^^ space complexity. 
□ 

Note that one may also use sparse matrix product algorithms, see for instance However, the 

matrix may not be sparse (in particular if there are vertices with large degrees, which is often the 

{ayz-pseudo-listing) , and listing, see Algorithm]^ {ay z-listmg). This was first proposed in |32l I33| . These algorithms have 
also been generalized to longer cycles in |37) but this is out of the scope of this paper. 
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case in practice as discussed in Section [SJ . But algorithms may take benefit from the fact that one of the 
two matrices involved in a product is sparse, and there also exists algorithms for products of more than 
two sparse matrices. These approaches lead to algorithms whose efficiency depends on the exact relation 
between m and n: it depends on the relation between n and m which algorithm is the fastest. Discussing 
this further therefore is quite complex, and it is out of the scope of this paper. 

In conclusion, despite the fact that the algorithms presented in this section are asymptotically very 
fast, they have two important limitations. First, they have a prohibitive space cost, since the matrices 
involved in the computation (in addition to the adjacency matrix, but it is considered as the encoding of 
G itself) may need Q{n?) space. Moreover, the fast matrix product algorithms are quite intricate, which 
leads to difficult implementations with high risks of errors. This also leads to large constant factors in 
the complexities, which have no importance at the asymptotic limit but may play a significant role in 
practice. 

For these reasons, and despite the fact that they clearly are of prime theoretical importance, these al- 
gorithms have limited practical impact. Instead, one generally uses one of the listing algorithms (adapted 
accordingly) that we detail now. 

4 Time-optimal listing algorithms. 

First notice that there may be (3) G 0(n^) triangles in G. Likewise, there may be 0(r?T,2) triangles, since 

G may be a clique of -y/m vertices (thus containing (^^) G 0(m2) triangles). This gives the following 
lower bounds for the time complexity of any triangle listing algorithm. 

Lemma 3 ( (231 ) Listing all triangl es in G is in i^{n^) and i7(m2) time. 

In this section, we first observe that the time complexity Q{n^) can easily be reached (Section 14. Ij) . 

3 

However, 0(m2) is much better in the case of sparse graphs. We present more subtle algorithms that 
reach this bound (Section 14. 2|) . Again, space complexity is a key issue, and we discuss this for each 
algorithm. We will see that algorithms proposed until now either rely on the use of adjacency matrices 
and/or have a J7(m) space complexity. We improve this by proposing algorithms that reach a Q(n) 

3 

space complexity, while needing only a simple compact representation of G, and still in 0(m2) time 
(Section ESI). 

4.1 Basic algorithms. 

One may trivially obtain a listing algorithm in Q(n^) (optimal) time with the matrix representation of 
G by testing in Q{1) time any possible triple of vertices. Moreover, this algorithm has the optimal space 
complexity 0(1). 

Theorem 4 f [321 and folklore) Given the adjacency matrix representation of G, it is possible to 
solve triangle listing in G(n'^) time and ©(1) space using the direct testing of every triple of vertices. 

This approach however has severe drawbacks. First, it needs the adjacency matrix of G. More impor- 
tantly, its complexity does not depend on the actual properties of G; it always needs Q{n^) computation 
steps even if the graph contains very few edges. It must however be clear that, if almost all triples of 
vertices form a triangle, no better asymptotic bound can be attained, and the simplicity of this algorithm 
makes it very efficient in these cases. 

In order to obtain faster algorithms on sparse graphs, while keeping the implementation very simple, 
one often uses the following algorithms. The first one, introduced in [SHI and called vertex-iterator in 



6 



|32ll33j . consists in iterating Algonthm^(vertex-listing) on each vertex of G. The second one, which seems 
to be the most widely used algorithm^, consists in iterating Algoiithm^ (edge-listing) over each edge in 
G. It was also first introduced in and discussed in [321 133| where the authors call it edge-iterator. 

Input: any simple compact representation of G, its adjacency matrix A, and a vertex v 
Output: all the triangles to which v belongs 
1. for each pair {u,w} of neighbors of v: 
la. if Auw = 1 then output triangle {u,v,w} 

Algorithm 2: — vertex-listing. Lists all the triangles containing a given vertex \2i^ . 



Input: any sorted simple compact representation of G, and an edge (u, v) of G 
Output: all the triangles in G containing (u, v) 
1. for each w in N{u) n N{v): 
la. output triangle {u, u, w} 

Algorithm 3: — edge-listing. Lists all the triangles containing a given edge ]2^. 



Theorem 5 ([23|, 15^ I33j) Given any simple compact representation of G and its adjacency matrix, it 
is possible to list all its triangles in (^yd{vY), 0(m,-dmax), Q{m-n), and 0(n^) time and 6(1) space; 
vertex-iterator achieves this. 

Proof: The fact that Algorithmic] [vertex-listing) list all the triangles to which a vertex v belongs is 
straightforward. Then, iterating over all vertices gives three times each triangle; if one wants each triangle 
only once it is sufficient to restrict the output of triangles to the ones for which r]{w) > r](v) > rj^u), for 
any injective numbering rjO of the vertices. 

Thanks to the simple compact representation of G, the pairs of neighbors of v may be computed 
in 0((i(u)^) time and 0(1) space (this would be impossible with the adjacency matrix only). Thanks 
to the adjacency matrix, the test in line la may be processed in 0(1) time and space (this would be 
impossible with the simple compact representaton only). The time complexity of Algorithm^ (vertex- 
listing) therefore is in e{d{v)^) time and 0(1) space. The 0(^„d(w)^) time and 0(1) space complexity 
of the overall algorithm follows. Moreover, we have d{v)'^) C 0{Y^^d{v) ■ dmax) = 0{m ■ dmax) ^ 

0{m-n) C 0(n'^), and all these complexity may be attained in the worst case (clique of n vertices), hence 
the results. □ 



Theorem 6 ((231 Eg EH] and folklore) Given any sorted simple compact representation of G, it is 
possible to list all its triangles in 0(m-dmax); Q(jn-n) and 0(n^) time and 0(1) space; The edge-iterator 
algorithm achieves this. 

Proof: The correctness of the algorithm is immediate. One may proceed like in the proof of Theorem [3 
to obtain each triangle only once. 

Each edge {u,v) is treated in time Q(d{u) + d{v)) (because N{u) and N{v) are sorted) and 0(1) space. 
We have d{u) + d{v) G 0(d 

max)) therefore the overall complexity is in 0(rn • (imax) ^ 0{jn ■ n) C 0{rfi). 
In the worst case (clique of n vertices) all these complexity are tight. □ 

®It is for instance implemented in the widely used complex network analysis software Pajek [7||^|5]- 
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First note that these algorithms are optimal in the worst case, just like the direct method (LemmaOl 
and Theorem 0]). However, there are much more efficient on sparse graphs, in particular if the maximal 
degree is low i7,, since they both are in 0(m • (imax) time. If the maximal degree is a constant, vertex- 
iterator even is in 0(n) time. Moreover, both algorithms only need space, which makes them very 
interesting from this perspective (we will see that there is no known faster algorithm with this space 
requirement). 

However, vertex-iterator has a severe drawback: it needs the adjacency matrix of G and a simple 
compact representation. Instead, edge-iterator only needs a sorted simple compact representation, which 
is often available in practice^. Moreover, edge-iterator runs in space, which makes it very compact. 
Because of these two reasons, and because of its simplicity, it is widely used in practice. 

The performance of these algorithms however are quite poor when the maximal degree is unbounded, 
and in particular if it grows like n. They may even be asymptotically sub-optimal on sparse graphs and/or 
on graphs with some vertices of high degree, which often appear in practice (we discuss this further in 
Section It is however possible to design time-optimal listing algorithms for sparse graphs, which we 
detail now. 

4.2 Time-optimal listing algorithms for sparse graphs. 

Several algorithms have been proposed that reach the ©(ma) bound of Lemma 01 and thus are time 
optimal on sparse graphs (note that this is also optimal for dense graphs, but we have seen in Section lO 
much simpler algorithms for these cases). Back in 1978, an algorithm was proposed to find a triangle in 
0(m2) time and 0(m) space Therefore it is slower than the ones discussed in Section 13] for finding, 
but it may be extended to obtain a listing algorithm with the same complexity. We first present this 
below. Then, we detail two simpler solutions with this complexity, proposed recently in [311 OS]- The 
first one consists in a simple extension of Algoiithm^ (ayz-pseudo-listing); the other one, named forward, 
has the advantage of being very efficient in practice |321 133j . Moreoever, we show in Section [4.31 that it 
may be slightly modified to reach a G(n) space cost. 

An approach based on covering trees j23| . 

We use here the classical notions of covering trees and connected components, as defined for instance 
in Since they are very classical, we do not recall them. We just note that a covering tree of each 
connected component of any graph may be computed in time linear in the number of edges of this graph, 
and space linear in its number of vertices (typically using a breadth-first search). One then has access to 
the father of any vertex in ©(1) time and space. 

In the authors propose a triangle finding algorithm in ©(ma) time and 0(m) space. We present 
here a simple extension of this algorithm to solve triangle listing with the same complexity. To achieve 
this, we need the following lemma, which is a simple extension of Lemma 4 in j23j . 

Lemma 7 ([23j) Let us consider a covering tree for each connected component ofG, and a triangle t in 
G having an edge in one of these trees. Then there exists an edge {u, v) in E hut in none of these trees, 
such that t = {u,v, father (v)} . 

Proof: Let t = {x, y, z} be a triangle in G, and let T be the tree that contains an edge of t. We can 
suppose without loss of generality that this edge is {x,y = father(x)). Two cases have to be considered. 
First, if (x, z) ^ T then it is in none of the trees, and taking v = x and u = z satisfies the claim. Second, 

^We also note that another 0(m • n) time algorithm was proposed in |29| for a more general problem. In the case of 
triangles, it does not improve vertex-iterator and edge-iterator, which are much simpler, therefore we do not detail it here. 
^Recall that one may sort the simple compact representation of G in 0(mlog(n)) time and 0(n) space, if needed. 
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if (x, z) £ T then we have father (z) = x (because father (x) = y ^ z). Moreover, (y, z) (else T would 
contain a cycle, namely t). Therefore taking v = z and u = y satisfies the claim. □ 



Input: any simple compact representation of G, and its adjacency matrix A 
Output: all the triangles in G 
1. while there remains an edge in E: 
la. compute a covering tree for each connected component of G 
lb. for each edge {u, v) in none of these trees: 
Iba. if (father (n), i;) G E then output triangle {u,v, father (u)} 
Ibb. else if (father (w), u) G E then output triangle {u,v, father {v)} 
Ic. remove from E all the edges in these trees 

Algorithm 4: — tree-listing. Lists all the triangles in a graph \2k^ . 



This lemma shows that, given a covering tree of each connected component of G, one may find triangles 
by checking for each edge {u, v) that belongs to none of these trees if {u, v, father(?;)} is a triangle. Then, 
all the triangles containing (u, father (u)) are discovered. This leads to Kl^orithm.^ [tree-listing), and to 
the following result (which is a direct extension of the one concerning triangle finding described in j23jl. 

Theorem 8 ( |23| ) Given any simple compact representation of G and its adjacency matrix, it is possible 

3 

to list all its triangles in B(m2) time and B(n) space; Algorithm^ (1;ree- listing j achieves this. 

Proof: Let us first prove that the algorithm is correct. It is clear that the algorithm may only output 
triangles. Suppose that one is missing. But all its edges have been removed when the computation 
stops, and so (at least) one of its edges was in a tree at some step. Let us consider the first such step 
(therefore the three edges of the triangle are present). Lemma [7| says that there exists an edge satisfying 
the condition tested in lines lb and Iba, and thus the triangle was discovered at this step. Finally, we 
reach a contradiction, and thus all triangles have been discovered. 

Now let us focus on the time complexity. Following |23(, let c denote the number of connected 
components at the current step of the algorithm. The value of c increases during the computation, until 
it reaches c = n. Two cases have to be considered. First suppose that c <n — y/rn. During this step of 
the algorithm, n — c>n — (n — y/rn) = ^/rn edges are removed. And thus there can be no more than 
= y/m such steps. Consider now the other case, On — ^Jm. The maximal degree then is at most 

n — c<n — (n — \/rn) = \/rn, and, since the degree of each vertex (of non-null degree) decreases at each 
step, there can be no more than ^/rn such steps. Finally, the total number of steps is bounded by 2 • ^/m. 
Moreover, each step costs 0{m) time: the test in line Iba is in 0(1) time thanks to the adjacency matrix, 
and line lb finds the 0{m) edges on which it is ran in 0(m) time thanks to the father() relation which 

3 ___ 

is in time. This leads to the 0(m2) time complexity, and, from Lemma |21 this bound is tight. 

Finally, let us focus on the space complexity. Suppose that removing an edge (u, v) is done by setting 
Auv and Ayu to 0, but without changing the simple compact representation. Then, the actual presence 
of an edge in the simple compact representation can be tested with only a constant additional cost by 
checking that the corresponding entry in the matrix is equal to 1. Therefore, this way of removing edges 
induces no significant additional time cost, while allowing a computation in G(n) space (needed for the 
trees). □ 

The space complexity obtained here is very good (and we will see that we are unable to obtain better 
ones), but it relies on the fact that the graph is given both in its adjacency matrix representation and 
a simple compact one. This reduces significantly the practical relevance of this approach concerning 
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reduced space complexity. We will see in the next section algorithms that have the same time and space 
complexities but needing only a simple compact representation of G. 

An extension of Algorithm ^ (ayz-pseudo-listing) jSl HI I32|, 133] , 

The fastest known algorithm for finding, counting, and pseudo-listing triangles, namely Algorithm ^(ayz- 
pseudo-listing), was proposed in 012. and we detailed it in Section|31 As proposed first in |321 133j . it is 
easy to modify it to obtain a listing algorithm, namely Algorithm^ { ay z-listing). 

Input: any simple compact representation of G, its adjacency matrix A, and an integer K 
Output: all the triangles in G 

1. for each vertex v with d{v) < K: 

la. output all triangles containing v with K\goT\t\\Ta.\^{vertex-listing), without duplicates 

2. let G' be the subgraph of G induced by d{v) > K} 

3. compute a sorted simple compact representation of G' 

4. list all triangles in G' using Algorithm^ (edge-listing) 

Algorithm 5: — ayz-listing. Lists all the triangles in a graph 1,9,'^ . 



Theorem 9 l,?,'^ \^ Given any simple compact representation of G and its adjacency matrix, it is 

3 

possible to list all its triangles in G(m2) time and 0(m) space; Algorithmic (ayz-listingj achieves this if 
one takes K S 0(-y/m). 

Proof: First recall that one may sort the simple compact representation of G in 0{m ■ log(n)) time and 
0(1) space. This has no impact on the overall complexity of Algorithm]^ (ayz-listing), thus we suppose 
in this proof that the representation is sorted. 

In a way similar to the proof of Theorem |21 let us first express the complexity of Algorithm El (ayz- 
listing) in terms of K. Using the Q(d(v)'^) complexity of Algorithm [2 (vertex- listing) we obtain that 
lines 1 and la have a cost in 0(Y2v,d{v)<K 

d(v)') C 0(E v,d(v)<K ^i"^) ' ^ — 0(m ■ K) time. Moreover, 

they have a 0(1) space cost (Theorem |5l) . 

Since we may suppose that the simple compact representation of G is sorted, line 3 can be achieved 
in 0(m) time. The number of vertices in G' is in O(^) and it may be a clique, thus the space needed 
for G is in 0((f )2). 

Finally, the overall time complexity is in O (m.K + m ■ ^) . The optimal is attained with K in Q(^/m), 
leading to the announced time complexity (which is tight from Lemma O}. The space complexity then is 
O((^)2) = 0(m). □ 

Again, this result has a significant space cost: it needs the adjacency matrix of G, and, even then, it 
needs 0(m) additional space. Moreover, it relies on the use of a parameter, K, which may be difficult to 
choose in practice: though Theorem |^ says that it must be in Q(^/rn), this makes little sense when one 
considers a given graph. We discuss further this issue in Sectional 

The forward fast algorithm [32, 33j. 

In |32| I33j ■ the authors propose another algorithm with optimal time complexity and a Q(m) cost, while 
needing only a simple compact representation of G. We now present it in detail. We give a new proof of 
the correctness and complexity of this algorithm, in order to be able to extend it in the next sections (in 
particular in Section [SJ. 
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Input: any simple compact representation of G 
Output: all the triangles in G 

1. number the vertices with an injective function r/() 

such that d{u) > d{v) implies t/(u) < r]{v) for all u and v 

2. let A be an array of n sets initially equal to 

3. for each vertex v taken in increasing order of r/(): 
3a. for each u £ N{v) with r/(u) > ri{v): 

3aa. for each w in A[u] D A[v]: output triangle {u,v,w} 
3ab. add v to A[u] 

Algorithm 6: — forward. Lists all the triangles in a graph 



Theorem 10 \S,^ Given any simple compact representation of G, it is possible to list all its triangles 

3 __ 

in @{m2) time and Q{m) space; Algorithmic (forward^ achieves this. 

Proof: For all vertices x, let us denote by A{x) = {y G N{x), r]{y) < r]{x)} the set of neighbors y 
of X with number r/(y) smaller than the one of x itself. For any triangle t = {a, b, c} one can suppose 
without loss of generality that r]{c) < rj{b) < rj{a). One may then discover t by discovering that c is in 
A{a)nA{b). 

This is what Algorithm El {forward) does. To show this it suffices to show that A[u] O A[v] = 
A{u) n A{v) when computed in line 3aa. 

First notice that when one enters in the main loop (line 3), then the set A[v] contains all the vertices 
in A{v). Indeed, u was previously treated by the main loop since r/(n) < rj(v), and during this lines 3 
and 3ab ensure that it has been added to A[v] (just replace u hy v and by in the pseudocode). 
Moreover, A[v] contains no other element, and thus it is exactly A{v) when one enters the main loop. 

Likewise, when entering the main loop for v, A[u] is not equal to A{u), but it contains all the 
vertices w such that r]{w) < r]{v) and that belong to A{u). Therefore, the intersections are equal: 
A[u] n A[v] = A{u) n A{v), and thus the algorithm is correct. 

If we turn to the time complexity, first notice that line 1 can be achieved in 0(n • log(n)) (and even 
in G(n)) time and G(n) space. This plays no role in the following. 

Now, note that lines 3 and 3a are nothing but a loop over all edges, thus in 0(m). Inside the loop, 
the expensive operation is the intersection computation. To obtain the claimed complexity, it suffices to 
show that both A[u] and A[v] contain 0{^/m.) vertices (since each structure A[x] is trivially sorted by 
construction, this is sufficient to ensure that the intersection computation is in 0{y/rn)). 

For any vertex x, by definition of A{x) and r/(), A(x) is included in the set of neighbors of x with 
degree at least d{x). Suppose x has Lo{^/rn) such neighbors: |A(x)| E uJ{^/m). But all these vertices have 
degree at least equal to the one of x, with d{x) > |^(x)|, and thus they have all together Lo{m) edges, 
which is impossible. Therefore one must have |^(x)| G 0{y/rn), and since A[x] C A{x) this proves the 

3 

0(m2) time complexity. This bound is tight from Lemma 01 

The space complexity is obtained when one notices that each edge induces a 0(1) space in A, leading 
to a global space in G(m). □ 

Compared to Algorithm]^ (ay z-listing), this algorithm has several advantages (although it has the 
same asymptotic time and space complexities). It is very simple and easy to implement, which also 
implies, as shown in |321l33j . that it is very efficient in practice. Moreover, it does not have the drawback 
of depending on a parameter K, central in A\gonthm^{ay z-listing). Finally, we show in the next sections 
that it may be slightly modified to obtain a 0(n) space complexity f Section l4.3p . and that even better 
performances can be proved if one considers power-law graphs (Section j^l). 
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4.3 Time-optimal compact algorithms for sparse graphs. 

This section is devoted to listing algorithms that have very low space requirements, both in terms of the 
given representation of G and in terms of the additional space needed. We will obtain two algorithms 
reaching a G(n) space cost while needing only a simple compact representation of G, and in optimal 

3 

0(m2) time. 

A compact version of Algorithm |S1 (/orioard). 

Thanks to the proof we gave of Theorem 1101 it is now easy to modify Algorithm |21 (forward) in order to 
improve significantly its space complexity. This leads to the following result. 

Input: any simple compact representation of G 
Output: all the triangles in G 

1. number the vertices with an injective function r/() 

such that d{u) > d{v) implies rj{v) > r](u) for all u and v 

2. sort the simple compact representation according to r/() 

3. for each vertex v taken in increasing order of r/(): 
3a. for each u € N{v) with r](u) > rj{v): 

3aa. let u' be the first neighbor of u, and v' the one of v 

3ab. while there remains untreated neighbors of u and v and r]{u') < r]{v) and r]{v') < r]{v): 
3aba. if r]{u') < r]{v') then set u' to the next neighbor of u 
3abb. else if ri{u') > i]{v') then set v' to the next neighbor of v 
3abc. else: 

3abca. output triangle {u,v,u'} 

3abcb. set u' to the next neighbor of u 

3abcc. set v' to the next neighbor of v 

Algorithm 7: — compact-forward. Lists all the triangles in a graph. 



Theorem 11 Given any simple compact representation of G, it is possible to list all its triangles in 
0(m2) time and 0(n) space; Algorithm^ (compact-forward^ achieves this. 

Proof: Recall that, as explained in the proof of Theorem when one computes the intersection of A[v\ 
and A\v\ (line 3aa of Algorithm|Bl (/or«;arc?)), A[v] is the set of neighbors of v with number lower than r/(f ), 
and A[u] is the set of neighbors of u with number lower than r]{v). If the adjacency structures encoding 
the neighborhoods are sorted according to ry(), we then have that A[v] is nothing but the beginning of 
N{v), truncated when we reach a vertex v' with rj{v') > rj{v). Likewise, A[u] is N{u) truncated at u' 
such that r]{u') > r]{v). 

Algorithm^ {compact- forward) uses this. Indeed, lines 3ab to 3abcc are nothing but the computation 
of the intersection of A[v] and A[u], which are supposed to be stored at the beginning of the adjacency 
structures, which is done in line 2. All this has no impact on the asymptotic time cost, and now the A 
structure does not have to be explicitly stored. 

Notice now that line 1 has a 0{n ■ log(n)) time and 0(n) space cost. Moreover, sorting the simple 
compact representation of G (line 2) is in 0(m • log(n)) time and @{1) space. These time complexities 
play no role in the overall complexity, but the space complexities induce a G(n) space cost for the overall 
algorithm. 

Finally, the time cost is the same as the one of Algorithm^ (forward), and the space cost is in 0(n). 

□ 
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In practice, this result means that one may encode vertices by integers, with the property that this 
numbering goes from highest degree vertices to lowest ones, then store the graph in a simple compact 
representation, sort it, and compute the triangles using Algorithm^ (compact- f orward) . In such a frame- 
work, it is important to notice that the algorithm runs in 0(1) space, since line 1, responsible for the 
G(n) cost, is unnecessary. On the other hand, if one wants to keep the original numbering of the vertices, 
then one has to store the function ry() and renumber the vertices back after the triangle computation. 
This has a 0(n) space cost (and no significant time cost). Going further, if one wants to restore the 
initial order inside the simple sorted representation, then one has to sort it back if it was sorted before 
the computation, and even to store a copy of it (then in 0(m) space) if it was unsorted. 

A new algorithm. 

The algorithms discussed until now basically rely on the fact that they avoid considering each pair of 
neighbors of high degree vertices, which would have a prohibitive cost. They do so by managing low degree 
vertices first, which has the consequence that most edges involved in the highest degrees have already 
been treated when the algorithm comes to these vertices. Here we take a quite different approach. First 
we design an algorithm able to efficiently list the triangles of high degree vertices. Then, we use it in 
an algorithm similar to Algorithm]^ {ay z-listing), but that both avoids adjacency matrix representation, 
and reaches a 0(n) space cost. 

First note that we already have an algorithm listing all the triangles containing a given vertex v, 
namely Algonthm^ (vertex-listing) [SHI. This algorithm is in 0(1) space, but it is unefficient on high 
degree vertices, since it needs 0(d(f)^) time. Our improved listing algorithm relies on an equivalent to 
Algonthm^ (vertex-listing) that avoids this. 

Input: any simple compact representation of G, and a vertex v 
Output: all the triangles to which v belongs 
1. create an array A of n booleans and set them to false 
1. for each vertex u in N(v), set A[u] to true 
3. for each vertex u in N(v): 
3a. for each vertex w in N(u): 
3aa. if A[w] then output {v,u,w} 

Algorithm 8: — new-vertex-listing. Lists all the triangles containing a given vertex. 



Lemma 12 Given any simple compact representation of G, it is posihle to list all its triangles containing 
a given vertex v in Q(m) (optimal) time and 0(n) space; Algorithm\^ (new-vertex-listingj achieves this. 

Proof: One may see AlgoT[thm^(new-vertex-listing) as a way to use the adjacency matrix of G without 
explicitely storing it: the array A is nothing but the v-th line of the adjacency-matrix. It is constructed 
in Q(n) time and space (lines 1 and 2). Then one can test for any edge (v,u) in 0(1) time and space. 
The loop starting at line 3 takes any edge containing one neighbor u of f and tests if its other end (w in 
the pseudo-code) is linked to v using A, in 0(1) time and space. This is sufficient to find all the triangles 
containing v. Since this number of edges is bounded by 2 • m (one may actually obtain an equivalent 
algorithm by replacing lines 3a and 3aa by a loop over all the edges), we obtain that the algorithm is in 
0(m) time and 0(n) space. 

The obtained time complexity is optimal since v may belong to 0(m) triangles. □ 
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Input: any sorted simple compact representation of G, and an integer K 
Output: all the triangles in G 

1. for each vertex v in V: 

la. if d{v) > K then, using K\goi\th.u\^{new-vertex-listing): 
laa. output all triangles {v, u, w} such that d{u) > K, d{w) > K and v > u > w 
lab. output all triangles {v, u, w} such that d{u) > K, d(w) < K and v > u 
lac. output all triangles {v, u, w} such that d(u) < K, d{w) > K and v > w 

2. for each edge (f , u) in E: 

2a. if < K and < K then: 
2aa. if u < w then output all triangles containing {u,v) using Algorithm^ {edge-listing) 

Algorithm 9: — new-listing. Lists all the triangles in a graph. 



Theorem 13 Given any sorted simple compact representation of G, it is possible to list all its triangles 
in G(m2) time and 0(n) space; Algorithmic (new-listingj achieves this if one takes K G G(y^). 

Proof: Similarily to the proof we gave of Theorem [HJ let us first study the complexity of Algorithm |^ 
[new-listing) as a function of K. For each vertex v with d{v) > K, one lists the number of triangles 
containing v in 0(m) time and 0(n) space f Lemma I12|) (the conditions in lines laa to lac, as well as the 
one in line 2aa, only serve to ensure that each triangle is listed exactly once). Then, one lists the triangles 
containing edges whose extremities are of degree at most K; this is done by Algouthm^ (edge-listing) in 
Q{K) time and 0(1) space for each edge, thus a total in 0{m ■ K) time and 0(1) space. 

Finally, the space complexity of the whole algorithm is independent of K and is in 0(n), and its time 
complexity is in 0(^ • m + m • K) time, since there are O(^) vertices with degree larger than K. In 
order to minimize this, we now take K in B(y^), which leads to the announced time complexity. □ 

Theorems ^2 and improve Theorems 1^1 and since they show that the same (optimal) time- 
complexity may be achieved in space 0(n) rather than 0(m). Moreover, this is space-optimal for pseudo- 
listing if one wants to keep the result in memory (the result itself is in 0(n)), which is generally the case 
(for clustering coefficient computations, for instance). 

3 

Note however that it is still unknown wether there exist algorithms with time complexity in 0(m2) 
but with o(n) space requirements. We saw that edge-iterator achieves 0(m • dmax) ^ 0{m ■ n) time and 
0(1) space complexities (Theorem , while needing only a sorted simple compact representation of G. 
If we suppose that the representation uses adjacency arrays, we obtain now the following stronger (if 
dmax e n{^/m ■ log(n))) result. 

Corollary 14 Given the adjacency array representation of G, it is possible to list all its triangles 
in 0(m2 Y^log(n)) time and 0(1) space; Algorithmic (new-listingj achieves this if one takes K G 
0(-y/m • log(n)). 

Proof: Let us first sort the arrays in 0[m ■ log(n)) time and 0(1) space. Then, we change Algorithm |H1 
(new-vertex-listing) by removing the use of A and replace line 3aa by a dichotomic search for lu in N{u), 
which has a cost in 0(log(n)) time and 0(1) space. Now if Algorithm^ (new-listing) uses this modified 
version of Algorithm|Hl (new-vertex-listing), then it is in 0(1) space and 0(^ ■ m ■ log(n) + m ■ K) time. 
The optimal value for K is then in Q(^Jrrl^\og(Jl)), leading to the announced complexity. □ 
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5 The case of power-law graphs. 



Until now, several results (including ours) took advantage of the fact that most large graphs met in 
practice are sparse; designing algorithms with complexities expressed in term of m rather than n then 
leads to significant improvements. 

Going further, it has been observed since several years that most large graphs met in pratice also 
have another important characteristic in common: their degrees are very heterogeneous. More precisely, 
in most cases, the vast majority of vertices have a very low degree while some have a huge degree. This 
is often captured by the fact that the degree distribution, i.e. the proportion pk for each k of vertices 
of degree k, is well fitted by a power-law: ~ /c~" for an exponent a generally between 2 and 3. See 
[HEl ^1 Q [2HI inni ^] for extensive lists of cases in which this property was observed^. 

We will see that several algorithms proposed in previous section have provable better performances 
on such graphs than on general (sparse) graphs. 

Let us first note that there are several ways to model real-world power-law distributions; see for 
insance |181 115j . We use here one of the most simple and classical ones, namely continuous power-laws; 
choosing one of the others would lead to similar results. In such a distribution, pk is taken to be equal 
to /^^^ Cx~°'dx, where C is the normalization constant This ensures that is proportional to k~°' 
in the limit where k is large. We must moreover ensure that the sum of the pk is equal to 1: YlV=iPk = 
C x-°dx = C ^ = 1. We obtain C7 = a-1, and finally pk = ^-Jk^^ x-'^dx = k-''+^-{k+iy+\ 
Finally, when we will talk about power-law graphs in the following, we will refer to graphs in which 
the proportion of vertices of degree k is pk = k~°'^^ — {k + 1)~"^^. 

Theorem 15 Given any simple compact representation of a power-law graph G with exponent a, it is 
possible to list all its triangles in 0{m-na) time and Q{n) space; Algorithmic (hew-listingj achieves this 
if one takes K G 0(n^)^ and Algorithm^ (compact-forwardj achieves this too. 

Proof: Let us denote by uk the number of vertices of degree larger than or equal to K. In a power-law 
graph with exponent a, this number is given by: ^ = YlT=KPk- We have YlT=KPk = 1 ~ J2k=i Pk = 
1 - (1 - ir-°+i) = K-'^+^. Therefore ur = n ■ K-°'+^. 

Let us first prove the result concerning Algorithm^ (new-listing). As already noticed in the proof of 
Theorem 1131 its space complexity does not depend on K, and it is 0(n). Moreover, its time complexity 
is in 0{nK -m + m- K). The value of K that minimizes this is in ©(n^ ), and the result for Algorithm 121 
[new-listing) follows. 

Let us now consider the case of Algorithm Q ( compact- forward) . The space complexity was already 
proved for Theorem 111! The time complexity is the same as the one for Algorithm El {forward) , and we 
use here the same notations as in the proof of Theorem 1101 Recall that the vertices are numbered by 
decreasing order of their degrees. 

Let us study the complexity of the intersection computation (line 3aa in Algorithm El (/orifarc/)). It is 
in ©(|74[n]| + |j4[z;]|). Recall that, at this point of the algorithm, A[v] is nothing but the set of neighbors 
of V with number lower than the one of v (and thus of degree at least equal to d{v)). Therefore, \A[v]\ 
is bounded both by d{v) and the number of vertices of degree at least d(f), i.e. n^K^^y Likewise, 
is bounded by d{u) and by n^(^^y since A[u] is the set of neighbors of u with degree at least equal to 
d{v). Moreover, we have r/(n) > r]{v) (line 3a of Algorithm El (/orward)), and so \A[vi\\ < d{u) < d{v). 
Finally, both and \A[v]\ are bounded by both d{v) and n^i^^y and the intersection computation is 

in 0{d{v) +nrf(^)). 

^Note that if a is a constant then m is in 0(n). It may however depend on n, and should be denoted by a(n). In order 
to keep the notations simple, we do not use this notation, but one must keep this in mind. 

^"One may also choose Pk proportional to j'^^J x~"Ax. Choosing any of this kind of solutions has little impact on the 

"~ 2 

obtained results, see |15j and the proofs we present in this section. 
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Like above, let us compute the value K of d{v) such that these two bounds are equal. We obtain 
K = ria. Then, the computation of the intersection is in 0{K + uk) = 0{na)^ and since the number 
of such computations is bounded by the number of edges (lines 3 and 3a of Algorithm El (/orward)), we 
obtain the announced complexity. □ 



This result improves significantly the known bounds, as soon as a is large enough. This holds in 
particular for typical cases met in practice, where a often is between 2 and 3 |13l It may be seen 
as an explanation of the fact that Algorithm El {forward) has very good performances on graphs with 
heterogeneous degree distributions, as shown experimentally in [321 133j . 

One may also use this approach to improve PAgoTith.m.^{ayz-pseudo-listing) and Algorithm El (ay2;- 
listing) in the case of power-law graphs as follows. 

Corollary 16 Given any simple compact representation of a power-law graph G with exponent a and its 

uj-a.-\-uj 

adjacency matrix, it is possible to solve pseudo-listing, counting and finding on G in 0(n" °'-"+2) time 
and 9(n'^ "-'^+2 ) space; Algorithmic fayz-pseudo-listingj achieves this if one takes K in Q{n^'"-^+^). 

Proof: With the same reasoning as the one in the proof of Theorem (21 one obtains that the algorithm 
runs in 0{n ■ + {uk)^) where nx denotes the number of vertices of degree larger than K. As explained 
in the proof of Theorem 1151 this is uk = n ■ K^°^^^. Therefore, the best K is such that n ■ is in 
Q{n'^ ■ i^'^-Ci-")). Finally, K must be in n'^ fi-")-^ ^ One then obtains the announced time complexity. 
The space complexity is bounded by the space needed to construct the adjacency matrix between the 
vertices of degree at most K, thus it is (n/^)^, and the result follows. □ 

If the degree distribution of G follows a power law with exponent a = 2.5 (typical for internet graphs 
jl3| IT]) then this result says that K\goi\thv[i^{ayz-pseudo-listing) reaches a 0(n^'^) time and 0(n^'^^) 
space complexity. If the exponent is larger, then the complexity is even better. Note that one may also 
obtain tighter bounds in terms of m and n, for instance using the fact that Algorithm ^ (ayz-pseudo- 
listing) has running time in Q{m- K + {nx)'^) rather than Q{n- K"^ + {nx)^) (see the proofs of Theorem[21 
and Corollarv [T6|) . We do not detail this here because the obtained results are quite technical and follow 
immediately from the ones we detailed. 

Corollary 17 Given any simple compact representation of a power-law graph G with exponent a and its 
adjacency matrix, it is possible to list all its triangles in 0(m • n^) time and Q{n~^) space; Algorithmic 
(^ayz- listing j achieves this if one takes K in 0(na). 

Proof: The time complexity of Algorithm (ay^-Zistrng') is in 0(m ■ K + m- nx)- The K minimizing this 
is such that K £ Q{nx), which is the same condition as the one in the proof of Theorem El therefore we 
reach the same time complexity. The space complexity is bounded by the size of the adjacency matrix, 
i.e. 0((nx)^)- This leads to the announced complexity. □ 

Notice that this result implies that, for some reasonable values of a (namely a > 2) the space 
complexity is in o{n). This however is of theoretical interest only: it relies on the use of both the 
adjacency matrix and a simple compact representation of G, which is unfeasable in practice for large 
graphs. 

Finally, the results presented in this section show that one may use properties of most large graphs 
met in practice (here, their heterogeneous degree distribution), to improve results known on the general 
case (or on the sparse graph case) . As we discuss further in Section (3 using such properties in the design 
of algorithms is a promising direction for algorithmic research on very large graphs met in practice. 
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We note however that we have no lower bound for the complexity of triangle listing with the assump- 
tion that the graph is a power-law one (which we had for general and sparse graphs); actually, we do 
not even have a proof of the fact that the given bound is tight for the presented algorithms. One may 
therefore prove that they have even better performance (or that the bound is tight), and algorithms faster 
than the ones presented here may exist (for power-law graphs). 

6 Experimental evaluation. 

In [,V2\ . the authors present a wide set of experiments on both real- world complex networks and 
some generated using various models, to evaluate experimentally the known algorithms. They focus on 
vertex-iterator, edge-iterator, Algorithm El (/orwarc/) , and Algonthm]^ {ayz-listing), together with their 
counting and pseudo- listing variants (they compute clustering coefficients). They also study variants of 
these algorithms using for instance hashtables and balanced trees. These variants have the same worst 
case asymptotic complexities but one may guess that they would run faster than the original algorithms, 
for several reasons we do not detail here. Matrix approaches are considered as too intricate to be used in 
practice. 

The overall conclusion of their extensive experiments is that Algorithm El (forward) performs best 
on real-world (sparse and power-law) graphs: its asymptotic time is optimal and the constants involved 
in its implementation are very small. Variants, which need more subtle data structure, actually fail in 
performing better in most cases (because of the overhead induced by the management of these structures). 

In order to integrate our contribution in this context and have a precise idea of the behavior of the 
discussed algorithms in practice, we also performed a wide set of experiments^^. They confirm that 
Algorithm j^l (forward) is very fast and outperforms classical approaches significantly. They also show 
that, even in the cases where available memory is sufficient for this algorithm, it is outperformed by 
Algorithm^ (compact- forward) because it avoids management of additional data structures. 

Note that Algorithm^ (new-listing), just like Algonthm^(ayz-pseudo-listing) and Algorithmic (ay^;- 
listing), suffers from a serious drawback: it relies on the choice of a relevant value for K, the maximal 
degree above which vertices are considered as having a high degree. Though in theory this is not a problem, 
in practice it may be quite difficult to determine the best value for K, i.e. the one that minimizes the 
execution time. It depends both on the machine running the program and on the graph under concern. 
One may evaluate the best K m. a, preprocessing step at running time, by measuring the time needed to 
perform the key steps of the algorithm for various K. This can be done without changing the asymptotic 
complexity. However, there is a much simpler way to choose K, with neglectible loss in performance, 
which we discuss below. Until then, we suppose that we were able to determine the best value for K. 

With this best value given, the performances of Algorithm El (new-listing) are similar to the ones of 
Algorithm El (/oraard) ; its space requirements are much lower, as predicted by Theorem 1131 Likewise, 
Algorithm El (new-foiin^) speed is close to the one of A\goTii\im.^(compact-forward) and it has the same 
space requirements. 

It is important to notice that the use of compact algorithms, namely A\goi\th.m.^(compact-forward) 
and Algorithm El (new-listing), makes it possible to manage graphs that were previously out of reach 
because of space requirements. To illustrate this, we present now an experiment on a huge graph which 
previous algorithms were unable to manage in our 8 GigaBytes memory machine. This experiment also 
has the advantage of being representative of what we observed on a wide variety of instances. 

The graph we consider here is a weh graph provided by the WebGraph project JHl- It contains all the 
web pages in the .uk domain discovered during a crawl conducted from the 11-th of july, 2005, at 00:51, 

Optimized implementations are provided at 1251 . 
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to the 30-th at 10:56 using UbiCrawler [11] • It has n = 39, 459, 925 vertices and m = 783, 027, 125 (undi- 
rected) edges, leading to more than 6 GigaBytes of memory usage if stored in (sorted) (uncompressed) 
adjacency arrays, each vertex being encoded in 4 bytes as an integer between and n — 1. Its degree 
distribution is plotted in Figure ^ showing that the degrees are very heterogeneous and reasonably weh 
fitted by a power-law of exponent a = 2.5. It contains 304, 529, 576 triangles. 

Let us insist on the fact that Algorithm El (/oraarc/) , as well as the ones based on adjacency matrices, 
are unable to manage this graph on our 8 GigaBytes memory machine. Instead, and despite the fact that 
it is quite slow, edge-iterator, with its 0(1) space complexity, can handle this. It took approximately 41 
hours to solve pseudo-listing on this graph with this algorithm on our machine. 

A\gouthin^{compact-forward) achieves much better results: it took approximately 20 minutes. Like- 
wise, Algonthm^ (new-listing) took around 45 minutes (depending on the value of K). This is probably 
close to what Algorithm El (/orward) would achieve in 16 GigaBytes of central memory. 
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Figure 1: Left: the degree distribution of our graph. Right: the execution time (in minutes) as a function 
of the number of vertices considered as high degree ones. 

We plot in Figure ^ (right) the running time of Algonthm^ (new-vertex-listing) as a function of the 
number of vertices with degree larger than K, for varying values of K. Surprisingly enough, this plot 
shows clearly that the time performance increases drastically as soon as a few vertices are considered 
as high degree ones. This may be seen as a consequence of the fact that edge-iterator is very efficient 
when the maximal degree is bounded; managing high degree vertices efficiently with Algorithm O (neiy- 
vertex-listing) and then the low degree ones with edge-iterator therefore leads to good performances. In 
other words, the few high degree vertices (which may be observed on the degree distribution plotted in 
Figure^ are responsible for the low performance of edge-iterator. 

When K decreases, the number of vertices with degree larger than K increases, and the performances 
continue to be better and better for a while. They reach a minimal running time, and then the running 
time grows again. The other important point here is that this growth is very slow, and thus the perfor- 
mance of the algorithm remains close to its best for a wide range of values of K. This implies that, with 
any reasonable guess for K, the algorithm performs well. 

7 Conclusion. 

In this contribution, we gave a detailed survey of existing results on triangle problems, and we completed 
them in two directions. First, we gave the space complexity of each previously known algorithm. Second, 
we proposed new algorithms that achieve both optimal time complexity and low space needs. Taking 
space requirements into account is a key issue in this context, since this currently is the bottleneck 
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for triangle problems when the considered graphs are very large. This is discussed on a practical case 
in Section El where we show that our compact algorithms make it possible to handle cases that were 
previously out of reach. 

Another significant contribution of this paper is the analysis of algorithm performances on power-law 
graphs (Section ISJ , which model a wide variety of very large graphs met in practice. We were able to 
show that, on such graphs, several algorithms have better performance than in the general (sparse) case. 
Finally, the current state of the art concerning triangle problems, including our new results, may be 
summarized as follows: 

• except the fact that pseudo-listing may have a G(n) space overhead (depending on the underlying 
algorithm), there is no known difference in time and space complexities between finding, counting, 
and pseudo-listing; 

• the fastest known algorithms for these three problems rely on matrix product and are in 0(n^'^''^) 
time and Q{n?) space (Theorem^), or in 0{m^'^^) time and 0(m^'^^^) space (Theoreml^J; however, 
no lower bound better than the trivial Q(m) one is known for the time complexity of these problems; 

• the other known algorithms rely on solutions to the listing problem and have the same performances 
as on this problem; they are slower than matrix approaches but need less space; 

• listing can be solved in B(n^) or G(n • m) (optimal in the general case) time and G(l) (optimal) 
space (Theorems ^ IS] and EI) ; this can be achieved from a sorted simple compact representation of 
the graph; 

3 

• listing may also be solved in 0(m2) (optimal in the general and sparse cases) time and 0(n) space 
( Theorems 1131 and 1 11 1) . still from a simple compact representation of the graph; this is much better 
for sparse graphs; 

• in the case of power-law graphs, it is possible to prove better complexities, leading to 0{m ■ n«) 
time and 6(ra) space solutions, where a is the exponent of the power-law (Theorem EI); 

• in practice, it is possible to obtain very good performances (both concerning time and space needs) 
using Algouthm^ (new-listing) and Algoiithm^ (compact- forward). 

We detailed several other results, but they are weaker (they need the adjacency matrix of the graph in 
input and/or have higher complexities) than these ones. 

This contribution also opens a set of questions for further research, most of them related to the tradeoff 
between space and time efficiency. Let us cite for instance: 

• can matrix approaches be modified in order to induce less space complexity? 

3 

• is listing feasable in o(n) space, while still in optimal time 0(m2)? 

• is it possible to design a listing algorithm with complexity o(m • n^) time and o{n) space for 
power-law graphs with exponent a? what is the optimal time complexity in this case? 

It is also important to notice that other approaches exist, based for instance on streaming algorithmics 
(avoiding to store the graph in central memory) [22\ ^ and / or approximate algorithms [M\ I24| I34j , 
and / or various methods to compress the graph [HI (HI . These approaches are very promising for graphs 
even larger than the ones considered here, in particular the ones that do not fit in central memory. 

Another interesting approach would be to express the complexity of triangle algorithms in terms of 
the number of triangles in the graph (and of its size). Indeed, it may be possible to achieve much better 
performance for listing algorithms if the graph contains few triangles. Likewise, it is reasonable to expect 
that triangle listing, but also pseudo-listing and counting, may perform poorly if there are many triangles 
in the graph. The finding problem, on the contrary, may be easier on graphs having many triangles. To 
our knowledge, this direction has not yet been explored. 

Finally, the results we present in Section take advantage of the fact that most very large graphs 
considered in practice may be approximed by power-law graphs. It is not the first time that algorithms 
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for triangle problems use underlying graph properties to get improved performance. For instance, results 
on planar graphs are provided in [2SI, and results using arboricity in ^lOl- It however appeared quite 
recently that many large graphs met in practice have some nontrivial (statistical) properties in common, 
and using these properties in the design of efficient algorithms still is at its very beginning. We consider 
this key direction for further research. 
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